import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None # default='warn'
Problem statement. The goal of this homework is to familiarize students with their choice of either time series prediction methods or generative adversarial networks via performing mini research projects. To get full credit, you need to submit solution to either Problem A (time series prediction methods) or Problem B (generative adversarial networks) described below - not both! Both Problems A and B assume some independent reading and literature research - lecture notes as well as papers provided in the syllabus will be helpful; Problem B assumes independent study of Pytorch or Tensorflow packages for training neural networks (it is only required for extra credit in Problem A). Below are the instructions how to install Pytorch locally and some Pytorch tutorials; similarly, instructions how to install Tensorflow locally and some Tensorflow tutorials.
Problem A. Financial time series prediction (100 points)
tickers_list = ['AMZN', 'GOOG', 'AAL', 'NCLH']
tickers_df = {}
for ticker in tickers_list:
tickers_df[ticker] = yf.download(tickers=[ticker], start="2016-01-01",end="2020-12-31", interval="1d")
[*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed [*********************100%***********************] 1 of 1 completed
tickers_df['AMZN'].tail()
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| Date | ||||||
| 2020-12-23 | 160.250000 | 160.506500 | 159.208496 | 159.263504 | 159.263504 | 41876000 |
| 2020-12-24 | 159.695007 | 160.100006 | 158.449997 | 158.634506 | 158.634506 | 29038000 |
| 2020-12-28 | 159.699997 | 165.199997 | 158.634506 | 164.197998 | 164.197998 | 113736000 |
| 2020-12-29 | 165.496994 | 167.532501 | 164.061005 | 166.100006 | 166.100006 | 97458000 |
| 2020-12-30 | 167.050003 | 167.104996 | 164.123505 | 164.292496 | 164.292496 | 64186000 |
Predict daily stock volumes.
1. (10 pts) Plot time series for stock volumes and close prices for the above time periods. List observations of the data patterns - what kind of properties should a model have in order to be able to predict stock volumes and close prices well? Comment on the distributional shift observations in 2020 - how would you enhance your models for 2020 to improve performance?
fig, ax = plt.subplots(figsize=(13, 5))
for ticker in tickers_list:
tickers_df[ticker]['Volume'].plot(label=ticker)
plt.legend(loc='best')
plt.title("Stock Volumes over time")
plt.xlabel("Date")
plt.ylabel("Stock Volumes")
plt.show()
for ticker in tickers_list:
tx = tickers_df[ticker][['Volume']]
idx = pd.date_range('2016-01-01', '2020-12-31', freq='D')
tx = tx.reindex(idx, method='ffill').fillna(method='bfill')
tx['dayofyear'] = tx.index.dayofyear
tx['year'] = tx.index.year
tx = tx.reset_index()
tx = tx.pivot_table(index='dayofyear',columns='year',values='Volume')
tx.plot(figsize=(13, 5))
plt.legend(loc='best')
plt.title(ticker+": Stock Volumes by year")
plt.xlabel("Day of year")
plt.ylabel("Stock Volumes")
fig, ax = plt.subplots(figsize=(13, 5))
for ticker in tickers_list:
tickers_df[ticker]['Adj Close'].plot(label=ticker)
plt.legend(loc='best')
plt.title("Closing price over time")
plt.xlabel("Date")
plt.ylabel("Closing price")
plt.show()
for ticker in tickers_list:
tx = tickers_df[ticker][['Adj Close']]
idx = pd.date_range('2016-01-01', '2020-12-31', freq='D')
tx = tx.reindex(idx, method='ffill').fillna(method='bfill')
tx['dayofyear'] = tx.index.dayofyear
tx['year'] = tx.index.year
tx = tx.reset_index()
tx = tx.pivot_table(index='dayofyear',columns='year',values='Adj Close')
tx.plot(figsize=(13, 5))
plt.legend(loc='best')
plt.title(ticker+": Closing price by year")
plt.xlabel("Day of year")
plt.ylabel("Stock Volumes")
2. (30 pts) Using $N$-day sliding window, use $N$-day average and $N$-day median methods to
predict daily stock volumes for $N+1$ st day in 2019 and 2020 for $N=10,30,60$, namely:
$
\begin{array}{lr}
y_{N+1}= & \frac{y_1+y_2+\ldots+y_N}{N} \\
y_{N+1}= & \text { median }\left(y_1, y_2, \ldots, y_N\right)
\end{array}
$
Analyze prediction error compared to realized volumes on the same days: compute average mean square error by month. Also, calculate mean square error for banking holidays vs ordinary business days. Do you observe any patterns which $N$ works best? Can you comment why? Do you see any difference across different stocks? Elaborate on your findings. You'll likely notice that mean square error will be smaller for ordinary business days than for banking holidays. You'll also likely notice increase in mean square error during the distributional shift due to the Covid shock in 2020.
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
for window_size in [10,30,60]:
for year in [2019, 2020]:
df_ = tickers_df[ticker][['Volume']]
df_.columns = ['Actual Volume']
df_.loc[:,'year'] = df_.index.year
df_ = df_[df_.year == year]
df_ = df_[['Actual Volume']]
df_ = df_.rolling(window = window_size).mean()
df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
df_.plot(figsize=(6, 3))
plt.legend(loc='best')
plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling mean", fontsize=10)
plt.xlabel("Date")
plt.ylabel("Volume")
df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
C:\Users\dell\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py:386: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). fig = self.plt.figure(figsize=self.figsize)
# holidays on which NASDAQ is open
banking_holidays = [
'2019-10-14', # Columbus Day
'2019-11-11', # Veterans Day
'2019-11-29', # Friday after Thanksgiving
'2019-12-31', # Dec 31
'2020-10-12', # Columbus Day
'2020-11-11', # Veterans Day
'2020-11-27', # Friday after Thanksgiving
'2020-12-31', # Dec 31
]
for year in [2019,2020]:
tx = pd.concat(squared_error[year],axis=1)
for ticker in tickers_list:
colnames= [col for col in tx.columns if ticker in col]
tx_ = tx[colnames].copy()
tx_.columns = ['10-day window','30-day window','60-day window']
ty = tx_.copy()
tx_ = tx_.groupby(pd.Grouper(freq='M')).mean()
tx_ = tx_.rename(index=lambda x: x.strftime('%B'))
tx_.plot(kind='bar',title=ticker+": Mean-squared error by month in "+str(year), figsize=(6,3))
holidays_list_ = pd.to_datetime(banking_holidays)
ty_holidays = ty.loc[ty.index.isin(holidays_list_)]
ty_working = ty[~ty.index.isin(holidays_list_)]
holidays = ty_holidays.mean(axis=0).to_frame()
holidays.columns = ['Holidays']
working = ty_working.mean(axis=0).to_frame()
working.columns = ['Business Days']
pd.concat([holidays,working],axis=1).T.plot(kind='barh',title=ticker+": Mean-squared error by business-days & holidays in "+str(year), figsize=(6,3))
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
for window_size in [10,30,60]:
for year in [2019, 2020]:
df_ = tickers_df[ticker][['Volume']]
df_.columns = ['Actual Volume']
df_.loc[:,'year'] = df_.index.year
df_ = df_[df_.year == year]
df_ = df_[['Actual Volume']]
df_ = df_.rolling(window = window_size).median()
df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
df_.plot(figsize=(6, 3))
plt.legend(loc='best')
plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling median", fontsize=10)
plt.xlabel("Date")
plt.ylabel("Volume")
df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
C:\Users\dell\anaconda3\lib\site-packages\pandas\plotting\_matplotlib\core.py:386: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). fig = self.plt.figure(figsize=self.figsize)
for year in [2019,2020]:
tx = pd.concat(squared_error[year],axis=1)
for ticker in tickers_list:
colnames= [col for col in tx.columns if ticker in col]
tx_ = tx[colnames].copy()
tx_.columns = ['10-day window','30-day window','60-day window']
ty = tx_.copy()
tx_ = tx_.groupby(pd.Grouper(freq='M')).mean()
tx_ = tx_.rename(index=lambda x: x.strftime('%B'))
tx_.plot(kind='bar',title=ticker+": Mean-squared error by month in "+str(year), figsize=(6,3))
holidays_list_ = pd.to_datetime(banking_holidays)
ty_holidays = ty.loc[ty.index.isin(holidays_list_)]
ty_working = ty[~ty.index.isin(holidays_list_)]
holidays = ty_holidays.mean(axis=0).to_frame()
holidays.columns = ['Holidays']
working = ty_working.mean(axis=0).to_frame()
working.columns = ['Business Days']
pd.concat([holidays,working],axis=1).T.plot(kind='barh',title=ticker+": Mean-squared error by business-days & holidays in "+str(year), figsize=(6,3))
3. ( $30 \mathrm{pts}$ ) Daily volumes are often forecast using linear autoregressive models. Using $N$-day sliding window, find coefficients $A, B, C$ in linear autoregressive models of lag 1 and lag 2 below to predict daily stock volumes for $N+1$ st day in 2019 and 2020 for $N=10,30,60$. Specifically:
$
\begin{array}{lr}
y_{N+1}= & A y_N+B+\epsilon_{N+1} \\
y_{N+1}= & A y_N+B y_{N-1}+C+\epsilon_{N+1}
\end{array}
$
Do you think models of higher lag would be necessary? Why? Do you observe any patterns which $N$ works best? Do you see any difference across different stocks? Repeat mean square error analysis above and comment on your findings with regard to ordinary business days vs. holidays as well as the distributional shift in 2020.
squared_error = {}
squared_error[2019] = []
squared_error[2020] = []
for ticker in tickers_list:
for window_size in [10,30,60]:
for year in [2019, 2020]:
df_ = tickers_df[ticker][['Volume']]
df_.columns = ['Actual Volume']
df_.loc[:,'year'] = df_.index.year
df_ = df_[df_.year == year]
df_ = df_[['Actual Volume']]
df_ = df_.rolling(window = window_size).median()
df_['Predicted Volume'] = df_['Actual Volume'].shift(1)
df_.plot(figsize=(6, 3))
plt.legend(loc='best')
plt.title(ticker+": Volume forecast using "+str(window_size)+"-day rolling median", fontsize=10)
plt.xlabel("Date")
plt.ylabel("Volume")
df_['Error'] = df_['Predicted Volume'] - df_['Actual Volume']
df_[ticker+"_"+str(window_size)+"_"+str(year)]= np.square(df_['Error'])
squared_error[year].append(df_[ticker+"_"+str(window_size)+"_"+str(year)])
df_
| Actual Volume | |
|---|---|
| Date | |
| 2019-01-02 | 159662000 |
| 2019-01-03 | 139512000 |
| 2019-01-04 | 183652000 |
| 2019-01-07 | 159864000 |
| 2019-01-08 | 177628000 |
| ... | ... |
| 2019-12-24 | 17626000 |
| 2019-12-26 | 120108000 |
| 2019-12-27 | 123732000 |
| 2019-12-30 | 73494000 |
| 2019-12-31 | 50130000 |
252 rows × 1 columns
ticker = 'AMZN'
year = 2019
df_ = tickers_df[ticker][['Volume']]
df_.columns = ['Actual Volume']
df_.loc[:,'year'] = df_.index.year
df_ = df_[df_.year == year]
df_ = df_[['Actual Volume']]
df_
| Actual Volume | |
|---|---|
| Date | |
| 2019-01-02 | 159662000 |
| 2019-01-03 | 139512000 |
| 2019-01-04 | 183652000 |
| 2019-01-07 | 159864000 |
| 2019-01-08 | 177628000 |
| ... | ... |
| 2019-12-24 | 17626000 |
| 2019-12-26 | 120108000 |
| 2019-12-27 | 123732000 |
| 2019-12-30 | 73494000 |
| 2019-12-31 | 50130000 |
252 rows × 1 columns
from statsmodels.tsa.ar_model import AutoReg
# from sklearn.metrics import mean_squared_error
window_size = 10
t = df_[window_size:]
tx
| Actual Volume | |
|---|---|
| Date | |
| 2019-01-02 | 159662000 |
| 2019-01-03 | 139512000 |
| 2019-01-04 | 183652000 |
| 2019-01-07 | 159864000 |
| 2019-01-08 | 177628000 |
| 2019-01-09 | 126976000 |
| 2019-01-10 | 130154000 |
| 2019-01-11 | 93724000 |
| 2019-01-14 | 120118000 |
| 2019-01-15 | 119970000 |
| 2019-01-16 | 127338000 |
index = t.index[0]
tx = df_[:index]
ty = tx[:window_size]
X = ty['Actual Volume'].values
model = AutoReg(X, lags=2)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
Coefficients: [3.00013043e+07 2.79992030e-01 4.74079593e-01]
model_fit.predict(start=len(X),end=len(X), dynamic=False)
array([1.20537441e+08])
tx
| Actual Volume | |
|---|---|
| Date | |
| 2019-01-02 | 159662000 |
| 2019-01-03 | 139512000 |
| 2019-01-04 | 183652000 |
| 2019-01-07 | 159864000 |
| 2019-01-08 | 177628000 |
| 2019-01-09 | 126976000 |
| 2019-01-10 | 130154000 |
| 2019-01-11 | 93724000 |
| 2019-01-14 | 120118000 |
| 2019-01-15 | 119970000 |
| 2019-01-16 | 127338000 |
i = 0
for index, row in df_.iterrows():
tx = df_[index:]
tx = tx[:window_size]
X = tx['Actual Volume'].values
model = AutoReg(X, lags=2)
model_fit = model.fit()
print('Coefficients: %s' % model_fit.params)
i = i + 1
Coefficients: [3.00013043e+07 2.79992030e-01 4.74079593e-01] Coefficients: [4.12654779e+07 3.24779391e-01 3.22071885e-01] Coefficients: [3.94981072e+07 4.71758305e-02 5.52380141e-01] Coefficients: [ 9.00110013e+07 -1.01521624e-01 2.86311734e-01] Coefficients: [ 1.47694199e+08 -3.18947925e-01 3.77734604e-02] Coefficients: [ 2.55230941e+08 -6.49179508e-01 -5.88686194e-01] Coefficients: [ 2.59437083e+08 -5.73038259e-01 -7.29176234e-01] Coefficients: [ 1.73877543e+08 -3.34969372e-02 -5.51836587e-01] Coefficients: [ 1.22344848e+08 4.83486561e-02 -2.01168245e-01] Coefficients: [ 1.33264721e+08 -4.26855980e-02 -2.57220987e-01] Coefficients: [ 1.17995363e+08 9.03457582e-02 -2.11187813e-01] Coefficients: [ 1.07667622e+08 1.15754878e+00 -1.10919715e+00] Coefficients: [ 1.43977960e+08 1.19821629e+00 -1.47277428e+00] Coefficients: [ 1.45399173e+08 1.19142001e+00 -1.46014198e+00] Coefficients: [ 1.22008246e+08 6.32026750e-01 -5.66888535e-01] Coefficients: [ 1.11031250e+08 6.64515246e-01 -5.41750411e-01] Coefficients: [ 1.15510059e+08 6.84952872e-01 -5.83665364e-01] Coefficients: [ 1.17642063e+08 6.70766884e-01 -5.74846201e-01] Coefficients: [ 9.85007856e+07 6.67842781e-01 -4.85585609e-01] Coefficients: [ 7.62929492e+07 6.13656473e-01 -3.37924586e-01] Coefficients: [ 8.20179825e+07 -1.20722437e-02 6.19739344e-02] Coefficients: [ 1.20969438e+08 -4.46750261e-01 4.47061076e-02] Coefficients: [ 1.46714781e+08 -5.49048083e-01 -1.49978373e-01] Coefficients: [ 1.34572398e+08 -5.06472723e-01 -6.45002341e-02] Coefficients: [ 1.03687042e+08 -2.76727280e-01 2.49053888e-02] Coefficients: [ 7.28043747e+07 -2.49213716e-01 2.86101233e-01] Coefficients: [7.53308629e+06 2.70689508e-01 5.82707401e-01] Coefficients: [2.06384699e+07 3.17945954e-01 3.53741125e-01] Coefficients: [-4.13022956e+06 6.35468349e-01 3.69364943e-01] Coefficients: [ 2.52733929e+07 6.54187663e-01 -4.95759844e-02] Coefficients: [3.22243120e+07 1.45561307e-01 3.15705499e-01] Coefficients: [ 8.50795677e+07 -5.88264416e-01 2.95409807e-01] Coefficients: [-1.15326866e+06 1.27883246e+00 -1.64465748e-01] Coefficients: [ 9.38033755e+07 9.89030350e-01 -1.37447708e+00] Coefficients: [ 5.60192479e+07 4.66706881e-01 -1.86185341e-01] Coefficients: [ 6.13791211e+07 4.53080944e-01 -1.97346453e-01] Coefficients: [ 7.59707232e+07 3.80111610e-01 -2.64809746e-01] Coefficients: [ 9.62922294e+07 2.13050582e-01 -3.23334039e-01] Coefficients: [ 1.47800494e+08 -1.06019755e-02 -6.58918186e-01] Coefficients: [ 1.29456095e+08 1.15132283e-01 -6.05664615e-01] Coefficients: [ 7.12171254e+07 1.46046674e-01 -6.22290932e-02] Coefficients: [ 1.74277927e+08 -6.62294707e-01 -4.00642881e-01] Coefficients: [ 1.88274479e+08 -1.18476244e-01 -1.10417114e+00] Coefficients: [6.25653320e+07 1.58928543e-01 2.03604498e-01] Coefficients: [5.31156634e+07 2.47046544e-01 2.44991406e-01] Coefficients: [5.62091405e+07 2.25808198e-01 2.64868732e-01] Coefficients: [6.54383192e+07 1.38316993e-01 3.11242962e-01] Coefficients: [ 9.58871080e+07 -4.59488875e-02 2.28217380e-01] Coefficients: [ 1.79172292e+08 -4.14738356e-01 -1.12283508e-01] Coefficients: [-4.26589733e+07 7.86061858e-01 5.25093190e-01] Coefficients: [-8.13241224e+07 9.27163494e-01 6.99715212e-01] Coefficients: [-2.48744282e+07 6.97149122e-01 4.41860657e-01] Coefficients: [2.32606265e+07 6.79284445e-01 2.83718079e-02] Coefficients: [1.77511725e+07 5.16520858e-01 2.17183803e-01] Coefficients: [3.89367941e+07 3.85673279e-01 8.94070976e-02] Coefficients: [4.66734652e+07 2.12807226e-01 1.51389242e-01] Coefficients: [ 7.32702841e+07 1.23018668e-01 -1.09386069e-01] Coefficients: [ 1.29721111e+08 -3.72533806e-01 -3.85825588e-01] Coefficients: [ 1.40073837e+08 -4.44572658e-01 -4.55732773e-01] Coefficients: [ 1.61515480e+08 -8.48337652e-01 -3.49428023e-01] Coefficients: [-8.01110051e+07 9.74961929e-01 1.05153589e+00] Coefficients: [1.91893115e+07 5.27492567e-01 1.73691509e-01] Coefficients: [ 5.63738378e+07 8.26961918e-01 -6.49660097e-01] Coefficients: [ 5.31300892e+07 5.21886349e-01 -3.22435846e-01] Coefficients: [ 6.39648085e+07 5.38605202e-01 -5.16942535e-01] Coefficients: [ 6.39143278e+07 5.06021813e-01 -5.18338012e-01] Coefficients: [ 9.18172473e+07 1.79557026e-01 -6.44602368e-01] Coefficients: [ 1.01821231e+08 4.96621283e-01 -1.07439856e+00] Coefficients: [ 1.08053674e+08 4.75697174e-01 -1.16095572e+00] Coefficients: [8.77744827e+05 2.34189541e-01 8.96193748e-01] Coefficients: [-6.94736664e+07 1.25661540e+00 9.09272828e-01] Coefficients: [ 8.00274713e+07 6.16844762e-01 -5.84687982e-01] Coefficients: [ 7.58420561e+07 4.04028998e-01 -2.38818114e-01] Coefficients: [ 8.45859176e+07 3.68213038e-01 -2.90019001e-01] Coefficients: [ 9.41176632e+07 3.22671641e-01 -3.31176501e-01] Coefficients: [ 1.11299641e+08 2.86066767e-01 -4.35040848e-01] Coefficients: [ 1.25578875e+08 2.67524682e-01 -5.28078354e-01] Coefficients: [ 1.08588692e+08 3.51439738e-01 -4.35794253e-01] Coefficients: [ 1.00318514e+08 1.49530833e-01 -2.40170844e-01] Coefficients: [ 7.70414545e+07 3.96527949e-01 -1.85827729e-01] Coefficients: [ 7.08941826e+07 3.52725101e-01 -4.83697415e-02] Coefficients: [ 9.63041365e+07 1.51224774e-01 -5.31065295e-02] Coefficients: [ 1.69197476e+08 -3.53531047e-01 -2.36080800e-01] Coefficients: [ 1.41750791e+08 -1.55341199e-01 -1.98735697e-01] Coefficients: [ 1.09511323e+08 -6.62936797e-02 -5.35550990e-03] Coefficients: [ 1.38587528e+08 -1.14932930e-01 -2.67013893e-01] Coefficients: [6.48585668e+07 3.20957000e-01 1.72272839e-02] Coefficients: [ 5.02576688e+07 7.98291670e-01 -3.39538708e-01] Coefficients: [-1.26114697e+07 5.59350330e-01 4.83544301e-01] Coefficients: [ 3.46346916e+07 -1.87145072e-01 7.00816916e-01] Coefficients: [ 3.95087614e+07 -8.75750960e-02 5.62175745e-01] Coefficients: [2.01326019e+07 1.52658881e-01 5.34559265e-01] Coefficients: [ 6.26110098e+07 -1.21343667e-01 2.90423139e-01] Coefficients: [ 9.43916229e+07 -4.41652980e-01 1.59847046e-01] Coefficients: [ 1.51235984e+08 -9.40190438e-01 -1.00757031e-01] Coefficients: [ 5.78047758e+07 7.01273484e-01 -3.11125929e-01] Coefficients: [ 1.08030459e+08 3.48374013e-01 -5.87535645e-01] Coefficients: [ 8.10526147e+07 3.34403863e-01 -2.11981227e-01] Coefficients: [ 9.30567573e+07 3.63324267e-01 -3.42277094e-01] Coefficients: [ 1.03182786e+08 2.91235323e-01 -3.39236844e-01] Coefficients: [ 1.12040713e+08 2.47867543e-01 -3.68102414e-01] Coefficients: [ 1.37302372e+08 1.79546931e-01 -5.22557104e-01] Coefficients: [ 1.31894888e+08 2.45703828e-01 -5.74618092e-01] Coefficients: [ 5.98905051e+07 4.12983788e-01 -1.68700292e-01] Coefficients: [ 3.29487878e+07 8.28269956e-01 -2.59753513e-01] Coefficients: [ 3.18045641e+07 9.00881564e-01 -3.39493590e-01] Coefficients: [ 4.79874560e+07 8.67404298e-01 -4.97450096e-01] Coefficients: [ 3.94823445e+07 7.03777430e-01 -3.14938100e-01] Coefficients: [ 4.99995473e+07 2.70165483e-01 -8.00687532e-02] Coefficients: [ 8.19872752e+07 -1.52319944e-01 -1.50341150e-01] Coefficients: [ 1.09382737e+08 -4.99281926e-01 -2.72452598e-01] Coefficients: [ 1.04223929e+08 -4.46917208e-01 -2.41543593e-01] Coefficients: [ 8.56750056e+07 -3.90058767e-01 -1.27900964e-02] Coefficients: [ 6.48773891e+07 -1.45293597e-01 5.67218328e-02] Coefficients: [ 5.06903864e+07 -3.39903085e-02 1.43786256e-01] Coefficients: [ 5.92163473e+07 -5.94034402e-02 3.93232207e-02] Coefficients: [ 6.79646826e+07 -9.48215937e-02 -9.98370656e-02] Coefficients: [ 7.59911373e+07 -1.97126563e-01 -2.35582788e-01] Coefficients: [ 6.88332385e+07 -1.02804357e-01 -2.04580479e-01] Coefficients: [ 8.74574776e+07 -2.16606776e-02 -6.78495326e-01] Coefficients: [ 7.35690147e+07 3.84304808e-01 -7.12018564e-01] Coefficients: [ 3.95786491e+07 9.06523368e-01 -5.29108750e-01] Coefficients: [ 3.21990590e+07 9.92357708e-01 -5.05451055e-01] Coefficients: [ 4.54960555e+07 1.09704983e+00 -8.46622298e-01] Coefficients: [ 4.83644667e+07 7.90493274e-01 -5.09587635e-01] Coefficients: [ 4.90360178e+07 6.60630001e-01 -3.76265847e-01] Coefficients: [ 4.29783407e+07 7.45782584e-01 -3.93415845e-01] Coefficients: [ 6.02625494e+07 7.16796969e-01 -5.81039720e-01] Coefficients: [ 4.36756937e+07 6.95332997e-01 -3.74852599e-01] Coefficients: [ 4.44489883e+07 4.14045660e-01 -1.52472138e-01] Coefficients: [ 7.20065758e+07 -1.21587249e-01 -1.08052817e-01] Coefficients: [ 5.63254474e+07 5.33332131e-02 -2.80249893e-02] Coefficients: [ 1.06042636e+08 -2.59383989e-01 -5.32323046e-01] Coefficients: [ 9.25467900e+07 7.46530880e-01 -1.23866672e+00] Coefficients: [ 5.32383366e+07 9.36729544e-01 -7.30049728e-01] Coefficients: [ 5.41672324e+07 1.01014578e+00 -8.49300347e-01] Coefficients: [ 5.96841058e+07 4.86030446e-01 -2.90011207e-01] Coefficients: [ 6.40096248e+07 5.08368592e-01 -3.38785665e-01] Coefficients: [ 6.71240712e+07 4.61051445e-01 -2.70686763e-01] Coefficients: [ 7.64307795e+07 3.65484155e-01 -1.95395179e-01] Coefficients: [ 8.16554351e+07 3.29853844e-01 -2.15309876e-01] Coefficients: [ 7.83246598e+07 3.29912242e-01 -1.78225424e-01] Coefficients: [ 8.10353668e+07 3.94628015e-01 -2.84215929e-01] Coefficients: [ 6.16966502e+07 5.49805681e-01 -2.23636760e-01] Coefficients: [ 2.12471399e+07 1.06001967e+00 -3.36394977e-01] Coefficients: [ 4.57340269e+07 1.12006332e+00 -6.37565988e-01] Coefficients: [ 4.57329427e+07 1.09685244e+00 -6.22143321e-01] Coefficients: [ 5.11117809e+07 7.50964854e-01 -4.18550658e-01] Coefficients: [ 5.13096562e+07 7.81887954e-01 -4.60053190e-01] Coefficients: [ 7.65891035e+07 7.24520370e-01 -7.61746761e-01] Coefficients: [ 5.54469290e+07 8.67898275e-01 -6.82520667e-01] Coefficients: [ 4.52875786e+07 9.72300730e-01 -6.52093570e-01] Coefficients: [ 3.92006839e+07 1.02306483e+00 -6.30517727e-01] Coefficients: [ 5.59451549e+07 1.14848606e+00 -9.79479026e-01] Coefficients: [ 6.05682917e+07 3.19425717e-01 -3.22460464e-01] Coefficients: [ 6.13343904e+07 1.79380208e-01 -1.94124004e-01] Coefficients: [ 6.36086987e+07 1.58745959e-01 -2.44449884e-01] Coefficients: [ 6.49292319e+07 1.46362880e-01 -2.49808104e-01] Coefficients: [ 6.72206775e+07 1.29821168e-01 -2.31657820e-01] Coefficients: [ 8.02744908e+07 -3.39265382e-02 -2.22732761e-01] Coefficients: [ 1.06936434e+08 -2.89107179e-01 -3.87350995e-01] Coefficients: [ 6.21981913e+07 -5.38660575e-02 1.06619750e-02] Coefficients: [ 8.93480556e+07 -6.04751039e-01 7.09015612e-02] Coefficients: [ 9.99687901e+07 -6.38399204e-01 -8.48305412e-02] Coefficients: [ 7.97753722e+07 -4.96455639e-01 1.27377785e-01] Coefficients: [ 6.96333638e+07 -3.76986019e-01 1.61150470e-01] Coefficients: [ 4.93528517e+07 -2.73651124e-01 3.86826891e-01] Coefficients: [ 4.85262484e+07 -3.83025667e-01 4.47653955e-01] Coefficients: [ 8.93525875e+07 -8.51592318e-01 1.97096426e-01] Coefficients: [ 1.00495037e+08 -8.39919161e-01 -2.15370138e-02] Coefficients: [ 1.32747817e+08 -1.04131285e+00 -4.58311179e-01] Coefficients: [ 8.49981676e+07 -6.47862773e-01 -1.68954465e-02] Coefficients: [ 1.85769517e+08 -1.71981983e+00 -8.13356020e-01] Coefficients: [ 7.52043534e+07 -2.89635061e-01 -1.16958552e-02] Coefficients: [ 4.32458973e+07 -1.49529974e-01 4.93105981e-01] Coefficients: [ 4.68097569e+07 -1.09929212e-01 4.55742635e-01] Coefficients: [ 4.75889399e+07 -9.77210231e-02 3.98874704e-01] Coefficients: [ 5.57983266e+07 -7.54664278e-02 3.02228376e-01] Coefficients: [ 7.71600592e+07 -3.53234254e-01 2.89271425e-01] Coefficients: [ 1.28748185e+08 -6.23378603e-01 -1.47785128e-01] Coefficients: [3.75783028e+07 4.33889936e-02 3.92806863e-01] Coefficients: [4.31820489e+07 3.65301008e-03 3.61408807e-01] Coefficients: [5.58362067e+07 7.54821647e-02 5.29239019e-02] Coefficients: [2.99271510e+07 3.78880470e-01 9.91867137e-02] Coefficients: [ 4.23821785e+07 3.96106049e-01 -1.20062860e-01] Coefficients: [3.32303501e+07 2.56775451e-01 9.62655712e-02] Coefficients: [2.44817108e+07 4.26785695e-01 1.09553293e-01] Coefficients: [ 4.32487874e+07 6.38442302e-01 -4.28569854e-01] Coefficients: [ 4.94866526e+07 1.48554292e-01 -1.20796441e-01] Coefficients: [ 7.13198073e+07 -3.83697592e-01 -2.88080434e-02] Coefficients: [ 9.79174796e+07 -5.79967951e-01 -3.51344415e-01] Coefficients: [ 7.66564688e+07 -4.15782219e-01 -6.73163851e-02] Coefficients: [ 7.47135371e+07 -4.20713871e-01 2.72501669e-02] Coefficients: [ 1.05733165e+08 -6.68043379e-01 -2.90751372e-01] Coefficients: [ 1.42822237e+08 -9.13857175e-01 -7.36304151e-01] Coefficients: [ 6.01282137e+07 -2.27863988e-01 4.48598956e-02] Coefficients: [ 1.12464364e+08 -5.63014586e-01 -5.12638133e-01] Coefficients: [ 2.26212379e+05 2.12270805e+00 -9.45406880e-01] Coefficients: [ 1.84666549e+08 9.42349399e-01 -3.13658416e+00] Coefficients: [ 7.97491566e+07 2.92271704e-01 -3.73531453e-01] Coefficients: [ 7.79990819e+07 3.12007021e-01 -3.84791830e-01] Coefficients: [ 7.97392863e+07 3.30923382e-01 -3.99872125e-01] Coefficients: [ 7.99300516e+07 3.04095812e-01 -3.71681875e-01] Coefficients: [ 8.65342778e+07 2.68865194e-01 -4.02418011e-01] Coefficients: [ 7.18550398e+07 3.29318294e-01 -3.51163414e-01] Coefficients: [ 4.29557990e+07 1.94272441e-01 -6.43217621e-02] Coefficients: [ 3.97983787e+07 3.05821810e-01 -9.23214079e-02] Coefficients: [ 4.71610463e+07 1.39712646e-01 -1.02698558e-01] Coefficients: [ 5.22163280e+07 3.50474156e-01 -4.44866063e-01] Coefficients: [ 4.12274870e+07 4.09070896e-01 -3.08954305e-01] Coefficients: [ 6.26972926e+07 1.54412264e-01 -4.98441311e-01] Coefficients: [ 8.88374068e+07 -2.42076019e-01 -7.20829802e-01] Coefficients: [2.53990680e+07 7.11606439e-02 4.62949396e-01] Coefficients: [1.29629731e+07 6.01496615e-02 8.02842548e-01] Coefficients: [4.35455318e+07 2.64025278e-02 1.21797452e-01] Coefficients: [4.79549893e+07 2.16266521e-02 6.94264530e-02] Coefficients: [ 5.48047286e+07 -9.36959849e-02 8.99348231e-02] Coefficients: [ 7.72532263e+07 -3.32429064e-01 -6.81620421e-02] Coefficients: [ 9.20204623e+07 -4.83831160e-01 -1.53081319e-01] Coefficients: [ 1.03003577e+08 -3.66934988e-01 -4.15179740e-01] Coefficients: [ 7.08795142e+07 6.37788882e-03 -2.53830691e-01] Coefficients: [ 7.56795460e+07 4.08653348e-02 -3.96902525e-01] Coefficients: [ 9.34025315e+07 -5.18197696e-01 -1.13495463e-01] Coefficients: [ 1.06214151e+08 -3.71078860e-01 -4.45023422e-01] Coefficients: [ 1.09686039e+08 -3.73212156e-01 -4.60978208e-01] Coefficients: [ 1.23470491e+08 -4.77420899e-01 -5.60105087e-01] Coefficients: [ 1.19267227e+08 -4.49409886e-01 -5.26989315e-01] Coefficients: [ 1.14950624e+08 -4.78183170e-01 -4.57525172e-01] Coefficients: [ 1.15403688e+08 -4.32274671e-01 -5.52521015e-01] Coefficients: [ 7.46792641e+07 -1.31702688e-01 -1.67167530e-01] Coefficients: [ 5.41804603e+07 2.23577518e-01 -2.04643270e-01] Coefficients: [ 5.89003192e+07 -1.21598995e-01 2.78600292e-02] Coefficients: [ 5.32994165e+07 -9.69815014e-02 1.24138444e-01] Coefficients: [2.81692500e+07 2.91878674e-01 2.38023706e-01] Coefficients: [2.15686734e+07 3.47661661e-01 2.94868123e-01] Coefficients: [ 3.87554171e+07 3.66981785e-01 -3.10979692e-02] Coefficients: [ 3.75403111e+07 -3.74326928e-01 8.49481752e-01] Coefficients: [ 6.27133026e+07 -5.14253377e-01 6.08591558e-01] Coefficients: [ 1.57059457e+08 -5.89203227e-01 -9.21120667e-01] Coefficients: [ 1.96380611e+08 -7.19996859e-01 -1.33145823e+00] Coefficients: [ 2.08033636e+08 -5.71293894e-01 -1.59430875e+00] Coefficients: [ 1.22105276e+08 1.75629198e-02 -7.16774430e-01] Coefficients: [ 1.21261559e+08 1.22636376e-02 -6.55751620e-01] Coefficients: [ 1.26167401e+08 -7.29668819e-03 -6.60813750e-01] Coefficients: [ 1.15876443e+08 5.52912599e-02 -6.38803192e-01] Coefficients: [ 1.25371184e+08 2.04270417e-01 -7.83174302e-01] Coefficients: [ 1.42111527e+08 -2.06982452e-02 -6.37912062e-01]
C:\Users\dell\anaconda3\lib\site-packages\statsmodels\regression\linear_model.py:1671: RuntimeWarning: divide by zero encountered in double_scalars return np.dot(wresid, wresid) / self.df_resid
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_12932\4169033949.py in <module> 6 X = tx['Actual Volume'].values 7 model = AutoReg(X, lags=2) ----> 8 model_fit = model.fit() 9 print('Coefficients: %s' % model_fit.params) 10 ~\anaconda3\lib\site-packages\statsmodels\tsa\ar_model.py in fit(self, cov_type, cov_kwds, use_t) 437 nobs = self._y.shape[0] 438 k = self._x.shape[1] --> 439 scale = nobs / (nobs - k) 440 cov_params /= scale 441 res = AutoRegResults( ZeroDivisionError: division by zero
4. (30 pts) Propose a method to improve volume prediction for banking holidays - you might need to use data for 2016, 2017 and 2018 (and, perhaps, even earlier) for that. Repeat the mean square error analysis and justify why the method that you are proposing is superior to the above.
5. (10 bonus points) Use neural networks to improve daily volume forecast above. Training neural networks can be expensive, therefore, for the purpose of current exercise, we might not need to consider the entire two-year time period - pick a month or two and focus on improving forecast over classic time series models for that time period. When presenting your results, elaborate on the neural network architecture used, training data (eg., the choice of the size of training data), training details (hyperparameters used, etc), training loss, etc. Visualizations will be helpful. Were you able to "beat" the benchmark in prior exercise in terms of prediction error?